MSCS Thesis Defense - Ankit Gupta November 18, 2024 3:30pm — 4:30pm Location: In Person - Newell-Simon 4305 Speaker: ANKIT GUPTA, Master's Student, Computer Science Department, Carnegie Mellon University Analyzing Multimodal Machine Learning Model Performance and Evaluation Metrics for Medical Report Generation As a result of recent advancements in foundation models, including large vision-language models, several researchers have explored methods of combining multiple modalities of data as inputs for visual question answering. One key application of visual question answering in the context of the healthcare domain is automated medical report generation, where x-ray images and text-based symptom data for a patient might be provided as inputs, with the intention of generating a relevant medical report as an output. However, very few studies analyze the performance of these models alongside uni-modal encoder-decoder models, and even fewer compare the performance of these multimodal models depending on whether they are provided symptom information as an input. Furthermore, past studies often use simple evaluation metrics that look at n-gram overlaps, such as BLEU and ROUGE scores, which are not effective for generative foundation models that can generate different sentences with the same semantic meaning.In this paper, we present two main contributions. First, we compare the performance of a variety of approaches for generating medical reports on a dataset of Chest X-Ray medical reports, including an encoder-decoder model, a multimodal model without symptom data, and a multimodal model with symptom data. Second, we design a new metric for evaluating the similarity between generated and reference medical reports using medical term transformation, sentence embeddings, and cosine vector similarity. Our results show that multimodal approaches to medical report generation far outperform encoder-decoder approaches, and providing symptom data slightly improves accuracy for generated medical reports. We also find that our evaluation metric more closely measures similarity between generated and reference medical reports than standard techniques, as evidenced by both quantitative and qualitative case-study comparisons.This research pushes the frontier of medical report generation by further reinforcing the accuracy benefits of using multimodal models with symptom inputs and introducing a more comprehensive, customized scoring metric for evaluating generated medical reports. Thesis CommitteeMin Xu (Chair)Martin ZhangBryan WilderAdditional Information Event Website: https://csd.cmu.edu/calendar/master-of-science-in-computer-science-thesis-defense-ankit-gupta